Domain Based Classification of Punjabi Text Documents

نویسندگان

  • Nidhi Krail
  • Vishal Gupta
چکیده

With the dramatic increase in the amount of content available in digital forms gives rise to a problem to manage this online textual data. As a result, it has become a necessary to classify large texts (documents) into specific classes. And Text Classification is a text mining technique which is used to classify the text documents into predefined classes. Most text classification techniques work on the principle of probabilities or matching terms with class name, in order to classify the documents into classes. The objective of this work is to consider the relationship among terms. And for this, Sports Specific Ontology is manually created for the first time. Two new algorithms, Ontology Based Classification and Hybrid Approach are proposed for Punjabi Text Classification. The experimental results conclude that Ontology Based Classification (85%) and Hybrid Approach (85%) provide better results.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Domain Based Classification of Punjabi Text Documents using Ontology and Hybrid Based Approach

Classification of text documents become a need in today’s world due to increase in the availability of electronic data over internet. Till now, no text classifier is available for the classification of Punjabi documents. The objective of the work is to find best Punjabi Text Classifier for Punjabi language. Two new algorithms, Ontology Based Classification and Hybrid Approach (which is the comb...

متن کامل

Punjabi Text Classification using Naïve Bayes , Centroid and Hybrid Approach

Punjabi Text Classification is the process of assigning predefined classes to the unlabelled text documents. Because of dramatic increase in the amount of content available in digital form, text classification becomes an urgent need to manage the digital data efficiently and accurately. Till now no Punjabi Text Classifier is available for Punjabi Text Documents. Therefore, in this paper, existi...

متن کامل

خوشه‌بندی اسناد مبتنی بر آنتولوژی و رویکرد فازی

Data mining, also known as knowledge discovery in database, is the process to discover unknown knowledge from a large amount of data. Text mining is to apply data mining techniques to extract knowledge from unstructured text. Text clustering is one of important techniques of text mining, which is the unsupervised classification of similar documents into different groups. The most important step...

متن کامل

Domain Based Punjabi Text Document Clustering

Text Clustering is a text mining technique which is used to group similar documents into single cluster by using some sort of similarity measure & separating the dissimilar documents. Popular clustering algorithms available for text clustering treats document as conglomeration of words. The syntactic or semantic relations between words are not given any consideration. Many different algorithms ...

متن کامل

Arabic News Articles Classification Using Vectorized-Cosine Based on Seed Documents

Besides for its own merits, text classification (TC) has become a cornerstone in many applications. Work presented here is part of and a pre-requisite for a project we have overtaken to create a corpus for the Arabic text process. It is an attempt to create modules automatically that would help speed up the process of classification for any text categorization task. It also serves as a tool for...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012